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ABSTRACT 

This paper attempts to characterize and present a 
state of the art view of several quantitative models 
and metrics of the software life cycle. These models 
and metrics can be used to aid in managing and engin- 
eering software-projects. They deal with various 
aspects of the software process and product, including 
resource allocation and estimation, changes and errors, 
size, complexity and reliability. Some indication is 
given of the extent to which the various models have 
been used and the success they have achieved. 


INTRODUCTION 

The past few years have seen the emergence of a 
new quantitative approach to software management and 
software engineering. It Includes the use of models 
and metrics based on historical data and experience. 

It covers resource estimation and planning, cost, 
personnel allocation, computer use, and quality 
assurance measures for size, structure and reliability 
'-.of the product. 

_ j A quantitative methodology is clearly needed to 
“' aid in the software development process. It is 

needed for understanding and comparison. It was said 
by Lord Kelvin that if you cannot measure something, 
then you do not understand it. This is certainly 
true in the software development domain and is the 
reason why various models and metrics have been de- 
veloped, tested, refined and established as aids. One 
needs models and quantification for comparisons. In 
cost tradeoffs, for example, it is important to know 
whether to add another feature, how much an extra 
level of reliability will cost, or whether a modi- 
fication to an existing system will be cost effective. 

It should be noted, however, that the quanti- 
tative approach should augment and not replace good 
management and engineering judgment. Models and metrics 
are only tools for the good manager and engineer. This 
is especially true since the state of the art is newly 
emerging and not yet well established. Some models 
and metrics have only been proposed but not fully 
tested. Others have been tested only in the environ- 
ment in which they have been developed. However, more 
and more are being tested and used in environments 
other than that of the developer. In this paper, some 
indication of the level of experience with the models 
or metrics discussed will be given. 

Models and metrics must be established via sound 
testing and experimentation and, before using a model, 
the manager or engineer should have sufficient know- 
ledge about how much to trust the results of the model. 
This requires insight into the model, a known confi- 
ance level with regard to its reliability and, most 
.•important, knowledge of the activity being modeled. 


None of these models are black boxes and should not be 
treated as such. Thus, before applying any model, the 
user should know the nature of his project, whether the 
assumptions of the model match the environment of his 
project, and the weaknesses of the model so that he can 
be careful in evaluating the results.. 

In what follows, we will cover a large, though by 
no means exhaustive, set of models. The emphasis will 
be on those areas where quantitative management can 
give the greatest payoff. We will discuss process- 
oriented measures such as size, complexity, and relia- 
bility. Each of the measures will be treated to varying 
degrees. The emphasis will be on categorizing the 
measures, defining a typical measure or set in the 
category, and pointing out other measures only when they 
are different. The references in the back of the paper 
should help the interested reader pursue a particular 
measure further or find additional measures not mention- 
ed in this paper. 

PROCESS MEASURES 

Resources 

It is important that we have a better understand- 
ing of the software development process and be able to 
control the distribution of resources such as computer 
time, personnel, and dollars. We are also interested 
in the effect of various methodologies on the software 
development process and how they change the distribu- 
tion of resources. For this reason, we are interested 
in knowing the ideal resource allocation, how it may be 
modified to fit the local environment, the effect of 
various tradeoffs, and what changes should be made in 
the methodology or environment to minimize resources 
expenditure. 

There has been a fair amount of work towards de- 
veloping different kinds of resource models. These 
models vary in what they provide (e.g., total cost, 
manning schedule) and what factors they use to calculate 
their estimates. They also vary with regard to the type 
of formula, parameters, use of previous data, and 
staffing considerations. In an attempt to characterize 
the models, we will define the following set of attri- 
bute pairs. Models can be characterized by the type of 
formula they use to calculate total effort. A single 
variable mode l uses one basic variable as a predictor 
of effort, while a multi-variable model uses several 
variables. A model may be static with regard to staff- 
ing, which means a constant formula is used to determine 
staffing levels for each activity, or it may be dynamic , 
implying staffing level is part of the effort formula 
itself. Within the static multi-variable models, there 
are various subcategories: adjusted baseline , adjusted 

table-driven , and multi-parameter equation . The 
adjusted baseline uses a single variable baseline 
equation which is adjusted in some way by a set of other 
variables. An adjusted table-driven model uses a 
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baseline estimate which is adjusted by a set of 
variables where the relationships are defined in 
tables built from historical data. A multi-parameter 
model contains a base formula which uses several vari- 
ables. A model may be based upon historical data or 
derived theoretically . An historical model uses data 
from previous projects to evaluate the current project 
and derive the weights and basic formula from analysis 
of that data. For a theoretical model, the formula 
is based upon assumptions about such things as how 
people solve problems. One last categorization is that 
some models are macro models, which means they are 
based upon a view of the big picture, while others are 
micro models in that the effort equation is derived 
from knowledge of small pieces of information scaled 
up. We will try to discuss at least one model in each 
of these categories. 

Static single variable models . The most common 
approach to estimating effort is to make it a function 
of a single variable, project size (e.g., the number 
of source instructions or object instructions). The 
baseline effort equation is of the form 


EFFORT 


SIZE 


where a and b are constants. The constants are deter- 
mined by regression analysis applied to historical 
data. In an attempt to measure the rate of production 
of lines of code by project as influenced by a number 
of product conditions and requirements, Walston and 
Felix (1) at IBM Federal Systems Division started with 
this basic model on a data base of 60 projects of 
4,000 to 467,000 source lines of code covering an 
effort of 12 to 11,758 man. months. The basic relation 
they derived was 


E = 5.2L 


91 


where E is the total effort in man months and L is the 
size in thousands of lines of delivered source code, 
including comments. Beside this basic relationship, 
other relations were defined. These include the rela- 
tionships between documentation DOC (in pages) and 
delivered source lines 


DOC = 49L 


1.01 


project duration D (in calendar months) and lines of 
code 


D = 4.1L 


.36 


project duration and effort 
D = 2.47E' 35 


and average staff size S (total staff months of effort/ 
duration) and effort 


E = 1.4L 


.94 


.92 


DOC = 29. 5L‘ 

.267 


D = 4.4L‘ 


4.4E 


.26 


2.3E 


.74 


Some other variables, including different ways of count- 
ing code, were measured by the Software Engineering 
Laboratory and the equations derived are given here. 
Letting DL = number of developed, delivered lines of 
source code (new code + 20% of reused code), M » number 
of modules, DM = total number of developed modules (all 
new or more than 20% new) we have 


E = 

1.58DL" 96 , 

E = .063M 1 ’ 186 , 

E = .19DM 1 ' 0 , 

D <= 

4.6DL' 28 , 

D = 2.0M' 33 , D 

“ 2. 5DM‘ 3 , 

D = 

2 . 0D ‘ 26 , 

DOC » 35.7DL' 92 , 

DOC f 1.5M 1 ’ 17 , 

DOC 

= 4.8DM'" 




Most of the SEL equations lie within one standard 
error of the IBM equation and, since the SEL environ- 
ment involves the development of more standardized 
software (software the organization has experience in 
building), the lower effort for more lines of code seems 
natural. It is also worth noting that the basic effort 
vs. lines-of-code equation is almost linear for the 
SEL — more linear than the Walston/Felix equation. Re- 
member that the project sizes are in the lower range of 
the IBM data. Lawrence and Jeffery (3) have studied 
even smaller projects and discovered that their data 
fits a straight line quite well, i.e., their baseline 
effort equation is of the form 

EFFORT - a * SIZE + b 

where again a and b are constants derived from historical 
data. The implication here is that the equation becomes 
more linear as the project sizes decrease. 

Static multi-variable models . Another approach to 
effort estimation is what we will call the static multi- 
variable model. A resource estimate here is multi- 
variable because it is based on several parameters, and 
static because a single effort value is calculated by 
the model formula. These models fall into several sub- 
categories. Some start with the baseline equation just 
discussed based on historical data and adjust the initial 
estimate by a set of variables which attempt to incor- 
porate the effects of important product and process 
attributes. In other models, the baseline equation 
itself involves more than one variable. 


S = .54E' 6 

The constants a and b are not general constants. 
They are derived from the historical data of the 
organization (in this case, IBM Federal Systems Divi- 
sion). They are not necessarily transportable to 
another organization with a different environment. For 
example, the Software Engineering Laboratory (SEL) on a 
data base consisting of 15 projects of 1.5 to 112 
thousand source lines of code covering efforts of 1.8 
to 116 staff months have calculated for their environ- 
ment the following set of equations (2): 


The models in the adjusted baseline class differ in 
the set of attributes that they consider important to 
their application area and development environment, the 
weights assigned to the attributes, and the constants 
of the baseline equations. 

Walston and Felix (1) calculated a productivity 
index by choosing 29 variables that showed a signifi- 
cantly high correlation with productivity in their en- 
vironment. It was suggested that these be used in 
estimating and were combined in a productivity index 
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where I Is the productivity index, w. is a factor 
weight based upon the productivity change for factor i 
and x. “ +1, 0, or -1, depending on whether the factor 
indicates increased, nominal or decreased productivity 


One model that fits into the single-parameter 
baseline equation with a set of adjusted multipliers 
is the model of Boehm (4), whose baseline effort 
estimate relies only upon project size. His set of 
attributes are grouped under four areas: (1) product — 

required fault freedom, data base size, product com- 
plexity, adaptation from existing software; (2) compu- 
ter — execution time constraint, machine storage 
constraint, virtual machine volatility, computer 
response time; (3) personnel — analyst capability, appli- 
cations experience, programmer capability, virtual 
machine experience, programming language experience; 

(4) project — modern programming practices, use of 
software tools, required development schedule. For 
each attribute Boehm gives a set of ratings ranging 
from very low to very high and, for most of the attri- 
butes, a quantitative measure describing each rating. 

The ratings are meant to be as objective as possible 
(hence the quantitative definitions), so that the 
person who must assign the ratings will have some in- 
tuition as to why each attribute could have a signifi- 
cant effect on the total effort. In’ two of the cases 
where quantitative measures are not possible, required 
fault freedom and product complexity, Boehm provides 
a chart describing the effect on the development 
activities or the characteristics of the code corre- 
sponding to each rating. Associated with the ratings 
is a chart of multipliers ranging from about .1 to 1.8. 
Another model which falls into this category is the 
model of Doty (5). The Doty model, however, provides 
a different set of weights for different applications 
besides two ways to estimate size. 


One model which falls into the category of 
adjusted table-driven is that of Wolverton (6). Here 
the basic algorithm involves categorizing the software 
routines. The categories Include control, I/O, pre- 
or post-algorithm processor, algorithm, data manage- 
ment, and time critical routines. Each of these 
routines has its own cost-of-development curve, depend- 
ing upon the degree of difficulty (easy, medium, or 
hard) and the newness of the application (new or old). 
The cost is then the number of instructions by cate- 
gory and degree of difficulty times the corresponding 
cost taken from a table. Another model of this type, 
but more simplistic, is Aron (7). 


The GRC model (8) involves a set of equations 
derived from historical data and theory for the 
various activities, several of which are multi- 
parameter equations of more than one variable. For 
example, the equation for code development is 


MM 


CD 


.9773 x N 


1.2583 

OF 


-.08953 * Y 


EXP 


where MM^ is the baseline staff months for code 
development task group for a subsystem, N - the 
number of output formats for a subsystem and Y is 
the average years of staff experience in code exp 
development. It is worth noting that size of the code 
is not a factor in this formula. Other formulas exist 
for the effort involved in analysis and design, system 
level testing, documentation. Installation, training, 
project control, elapsed time and a reasonable check 
for the total staff months for the project (MM_ DriT ) 

^ROJ “ - 0218 * << 2 + N ov>-* ln<2 + IW)) 1 -' 1 


OF' 

where is as defined above. 


OF' 


Dynamic multi-variable models . Once an effort 
estimate is made, the next question of concern is how 
to assign people to the project so that the deadlines 
for the various development activities will be met. 
Here again there are basically two approaches: the 

one empirical, the other theoretical. Each of the 
methods discussed so far uses the empirical approach 
which tries to identify the activities which are a 
part of the development process of a typical project 
for their software house. Then, using accounting data 
from past projects, they determine what percentage of 
the effort was expended on each activity. These 
percentages serve as a baseline and are intuitively 
adjusted to meet the expected demands of a new project. 
For example, in the Wolverton model, total cost is 
allocated into five major subareas: analysis cost 

(202 of total), design cost (18.7% of total), coding 
cost (21.7% of total), testing cost (28.3% of total) 
and documentation cost (11.3% of total). Each of 
these subarea costs are subdivided again, depending 
upon the activities in the subareas. In this way, 
each activity can be staffed according. to its indi- 
vidual budget. Allocation of time is determined by 
history and good management intuition. 

The theoretical approach attempts to justify 
its resource expenditure curve by deriving it from 
equations which model problem-solving behavior. In 
other words, the resource model lays out the staffing 
across time and within phases. We will refer to 
this approach as the dynamic multi-variable model. 

It is dynamic because the model produces a curve which 
describes the variation of staffing level across time. 
The model is multi-variable because it involves more 
Chan one parameter. 

Two models in this category will be discussed 
which differ in the assumptions they make. The first 
model, which is the most widely known and used, is the 
Putnam model (9). 

The model is based on a hardware development 
model (10) which noted that there are regular patterns 
of manpower buildup and phase-out independent of the 
type of work done. It is related to the way people 
solve problems. Thus, each activity could be plotted 
as a curve which grows and then shrinks with regard to 
staff effort across time. For example, the cycles in 
the life of a development engineering project look as 
follows : 



Similar curves were derived by Putnam for software 
cycles which are: planning, design and implementation, 

testing and validation, extension, modification and 
maintenance. 
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The theoretical basis of the model is that soft- 
ware development is a problem-solving effort and design 
decision-making is the exhaustion process. The various 
development activities partition the problem space 
into subspaces corresponding to the various stages 
(cycles) in the life cycle. A set of assumptions is 
then made about the problem subset: (1) the number of 

problems to be solved is finite, (2) the problem- 
solving effort makes an impact on and defines an en- 
vironment for the unsolved problem set, (3) a decision 
removes one unsolved problem from the set (assumes 
events are random and independent) and (4) the staff 
size is proportional to the number of problems "ripe" 
for solution. Because the model is theoretically based 
(rather than empirically based) some motivation for the 
equation is given. Consider a set of independent de- 
vices under test (unsolved problem set) subject to 
some environment (the problem-solving effort) which 
generates shocks (planning and design decisions). The 
shocks are destructive to the devices under test with 
some dependent conditional probability distribution 
p(t) which is random and independent with some rate 
parameter X. Assume the distribution is Poisson and 
let T be a random variable associated with the time 
interval between shocks 

Pr (T > t) = Pr (1) 

(no event occurs in interval (o, t)) 

where t = o is the time of the most recent shock 
letting p(t) be the conditional probability of a fail- 
ure given that a shock has occurred and X be the 
Poisson rate parameter, then 


the integral form of the life cycle equation 
-at 2 

y = K * (1 - e at ) 

where 

y is the cumulative manpower used through 
time t 

K is the total manpower required by the cycle 
stated in quantities related to the time 
period used as a base, e.g., man-months/ 
month 

a is a parameter determined by the time period 
in which y ' reaches its maximum value 
(shape parameter) 

t is time in equal units counted from the 
start of the cycle 



The life cycle equation (derivative form) is 


Pr(T>t ) = e - Xl ^ P(X,dX) (2) 

‘ and 

Pr (T «S t) = 1 - e _X ' * PlX ’ dXl (3) 

and the p.d.f. associated with (3) is 

f(t) = X'p(t)*e p(x)dx), t>o 

This leads to the class of Weibull distributions (known 
in reliability work) with the physical interpretation 
that the probability of devices succumbing to destruc- 
tive shocks is changing with time. Based upon observed 
data on engineering design projects, a special case of 
(3) can be used 


y' - 2 K a t e 

where y' is the manpower required in time period t 
stated in quantities related to the time period used 
as a base and K is the total manpower required by the 
cycle stated in the same units as y' . 



y = f (t) = 1 - e at 

(4) 

where p(t) = <* t 

(5) 

and a = 

(6) 


Note that this implies engineers learn to solve problems 
with an increasing effectiveness (i.e., familiarity with 
the problems at hand leads to greater insight and sure- 
ness). Parameter a consists of an insight generation 
rate X and a solution finding factor a. Equation (5) is 
a special linear case of the family of learning curves: 
y = a x^. 

Equation (4) is then the normalized form of the 
life cycle equation. By introducing a parameter (K) 
xpressed in terms of effort, we get an effort curve. 


The curve (called the Rayleigh Curve) represents the 
manpower buildup. The sum of the individual cycle 
curves results in a pure Rayleigh shape. Software de- 
velopment is implemented as a functionally homogenous 
effort (single purpose). The shape parameter a depends 
upon the point in time at which y' reaches its maxi- 
mum , i.e. 

* ■ 1 v 

where t^ is the time to reach peak effort. Putnam has 
empirically shown t corresponds closely to the design 
time (time to reach 0 initial operational capability). 
Substituting for a we can rewrite the life cycle 
equation as 


y“ = K * te -t f 2 t. 2 
d 

' d 2 
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the equations given are for the entire life cycle. 
To find development effort only 


take 


y = K * (l-e" at ) 


substitute 


V 


y = K * (l-e (_t ; 2t 2) ) 
a 


chon che development effort Is time to t 


y « K * (1-e (-t y 2t 2) ) 
a 

= K * (1-e"' 5 ) 

= . 3935K 

or DE » 40% of LC effort 


The life cycle and development costs may be calcu- 
lated by multiplying the cost for that cycle by staff 
year cost 

$LC = K*MC 

where MC » mean cost (in $) per man year of 
effort 

K *> total manpower (in man years) used 
by the project 

(Note: the equation neglects computer time, 

inflation, overtime, etc.) 


and 

SDEV =• MC * (. 3935K)*a . 4 * $LC 

2 

Putnam found that the ratio K/(t.) has an inter- 
esting property. It represents the difficulty of a 
system in terms of programming effort required to pro- 
duce it. He defines 

D - K/(t 2 ) 

To illustrate how management decisions can in- 
fluence the difficulty of a project, assume a system 
size of K » 400 MY and t d “ 3 years. Then the diffi- 
culty D = 400 / 9 <* 44.4 a man years per year squared. 

Consider a management decision to cut the life 
cycle cost of the system by 102. Now, K = .9 * (400) - 
360 MY and D » 360 / 9 = 40. This results in a 10% 
decrease in assumed difficulty of the project. This 
decision assumes the difficulty is less than it really 
is, and che result is less product. 

Now consider the more common case of attempted 
time compression. Assume management makes a decision 
to limit the expended effort to 400 MY, but wants the 
system in 2.5 years instead of 3 years. Now, K » 

400 MY, t, = 2.5 years, and D “ 400 / 6.25 *= 64 (a 
442 increase). The result of shortening the natural 
development time is a dramatic increase in the system 
difficulty. 

The Putnam model generates some interesting 
notions. Productivity is related to the difficulty 
and the state of technology; management cannot arbitra- 
rily increase productivity nor can it reduce develop- 
ment time without increasing difficulty. The tradeoff 
law shows the cost of trading time for people. 


In deriving an alternate model, Parr (11) ques- 
tions the assumption of che Rayleigh equation that 
the initially rising work rate is due to the linear 
learning curve which governs the skill available for 
solving problems. He argues that the skill available 
on a project depends on the resources applied to it 
and that the assumption confuses the intrinsic con- 
straints on the rate at which software can be developed 
with management's economically-governed choices about 
how to respond to these constraints. 


As an alternative to this assumption, his model 
suggests that the initial rate of solving problems is 
governed by how the problems in the project are re- 
lated, i.e., the dependencies between them. For 
example, the central phase of development is naturally 
suited to rapid rates of progress since that is when 
the largest number of problems are visible. Letting 
V (t ) be the expected size of this set of visible 
(available for solving) problems at time t, Parr's model 
yields the equation 


V(t) = 


Ac 


■ y “t 


(1 + A0- T • lt ) ( ' S>+ llY) 


where 


a is the proportionality constant relating the 
rate of progress and the expected size of the 
visible set 

A is a measure of the amount of work done on the 
project before the project officially starts 

y* is a structuring index which measures how much 
the development process is formalized and uses 
modern techniques. 


The curve represented by V(t) differs from the 
Rayleigh/Norden curve for y'(t) in two important ways. 

The Rayleigh curve is constrained to go through the 
origin; the Parr curve is not. Making y'(0) = 0 
corresponds to setting an official start date for the 
project. Before that point, the effort expended on 
the project is assumed to be minimal. In reality, there 
is often a good deal of work done before that date, in- 
cluding such activities as requirements analysis and 
feasibility studies. In Putnam’s environment, these 
were handled by a separate organization and could be 
ignored. Another factor that affects the problem space 
is past experience in the application area, or even 
more tangible is the influence of design or code taken 
from past projects. All of these have the effect of 
structuring the problem space at the beginning, so that 
more progress can be made early. The Parr curve accounts 
for this; the Putnam curve does not. See Fig.l for a 
comparison of the two curves. 


A second distinction between the two curves is 
the flexibility of where the point of maximum effort can 
come. By using a structuring index greater than one, 
this point of maximum effort can be delayed almost to 
acceptance testing and effort could still be drastically 
reduced before project completion. With the Rayleigh 
curve, a late point of maximum effort constrains the 
curve to have a slow buildup and almost no decay at 
the end. 


Parr does not say how to estimate the parameters 
for V(t) in terms of data the project manager would 
have on hand. This is a problem in doing resource 
estimation currently, but the model could use the ex- 
isting resource allocation schedule, based on early 
data points, to predict the latter part of the curve. 
The Parr model is only currently being tested on real 
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software for the first time and the results are not 
yet available. The Rayleigh model, on the other hand, 
has been used in many environments and has been quite 
successful on the whole. 

Single variable, theoretical . The two previous 
theoretical models may be thought of as macro models 
in that the estimate of staffing levels relies on 
process oriented issues, such as total effort, schedule 
constraints, and the degree that structured methodol- 
ogy is used. Product oriented issues, such as source 
code, are not a factor. Most of the other models are 
less macro oriented in that they consider product 
characteristics, such as lines of code and input/ 
output formats. In this section, we will discuss 
another type of theoretical model, based upon lower 
level aspects of the product, which we will call a 
micro model. The particular model discussed here deals 
with the idea that some basic relationships hold with 
regard to the number of unique operators and operands 
used in solving a problem and the eventual effort and 
time required for development. This notion was pro- 
posed by Halstead as part of his software science (12). 
Here there is only one basic parameter — size — measured 
in terms of operators and operands. The model tran- 
scends methodology and environmental factors. Most of 
the work in this area has dealt with programs or algo- 
rithms of module size rather than with entire systems, 
but that appears to be changing. 

In the language of software science, measurable 
properties of algorithms are 



number of unique or distinct operands in 
an implementation 

number of occurrences of the j*"* 1 most 
frequent operator, j * 1, 2, ..n^ 

number of occurrences of the j th most 
frequent operand, j = 1, 2, . .n^ 


then the vocabulary of an algorithm is 
n ■= n^ + n 2 

and the implementation length is 


N 


n 1 + n 2 


where 


N = f f, . 
1 j = l 1,J 


n 2 

N 7 = Z f, ■ 
2 )-l 2 ’> 


.1 " l , 
i=i j=i i>i 


Based only on the unique operators and operands, 
the concept of program length N can be estimated as 

= " 1 log 2 n x + n 2 log 2 n 2 

ft is actually the number of bits necessary to represent 
all things that exist in the program at least once, i.e. 
the number of bits necessary to represent a symbol table 
Over a large set of programs in different environments, 
it has been shown that ft approximates N very well. 


n 


1 


number of unique or distinct operators in 
an implementation 


To measure the size of an algorithm, software 
science transcends the variation in language and charac- 
ter set by defining algorithm size (volume) as the 
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minimal number of bits necessary to represent the imple- 
"""mentation of the algorithm. For any particular case, 
there is an absolute minimum length for representing 
the longest operator or operand name expressed in bits. 

. It depends upon n, e.g., a vocabulary of 8 elements re- 
1 — quires 8 different designators, or log^ 8 is the mini- 
mal length in bits necessary to represent all individual 
elements in a program, thus, a suitable metric for size 
of any implementation of any algorithm is V » N log n, 
called volume. 


of both binary digits and discriminations) we get 
E V 

T ° ~ = s ~ ■ Halstead empirically estimated S = 18 

for his_ environment , but this may vary from environment 
to environment. 

Software science metrics have been validated in a 
variety of environments but predominantly for module 
size developments. 


The most succinct form in which an algorithm can 
be expressed requires a language in which the required 
operation is already defined and implemented. The po- 
■"“tential volume, V*, is defined as 

V* - (N* + N*) log 2 (n* + n*) 

but minimal form Implies N* » n* and N* « n* because 
""'there should be no repetition. x The number of operators 
should consist of one distinct operator for the function 
lame and another to serve as an assignment or grouping 
symbol so n* « 2. Thus, V* « (2 + n*) log 
—where n* represents the number of differen 
parameters. Note: V* is considered a useful measure 

sf an algorithm's content. It is roughly related to the 
^aslc GRC model concept of input/output formats. In 
......'act, the GRC equation for man months of the project 

relationship between MMp RQG 


(MM? rog ) is an exponent 
and an estimate of V*. 


, (2 + n*) 
t input/output 


The level of the implementation of a program is 
defined as its relation to Its most abstract form, V*, 
i e V* 

L “ L < 1 and the most succinct expression 

for an algorithm has a level of 1. V* » L x V implies 
that when the volume goes up the level goes down. Since 
it is hard to calculate V*, an approximation for L, L, 
is calculated directly from an implementation 

~L. The reciprocal of level is defined as 

n l N 2 

the difficulty, D - 1/L, which can be viewed as the 
amount of redundancy within an implementation. 

Based on these primitives, formulas for program- 
ming effort (E) and time (T) are derived. Assuming the 
implementation of an algorithm consists of N selections 
from a vocabulary of n elements and that the selection 
•is non-random and of the order of a binary search (im- 
plying log. n comparisons for the selection of each 
element), the effort required to generate a program is 
* log. n mental comparisons (this is equal to the 
/olume (V) of the program). Each mental comparison re- 
quires a number of elementary mental discriminations 
where this number is a measure of the difficulty (D) of 
che task. Thus, the total number of elementary mental 
liscrlminations E required to. generate a given program 
“should be E ■ V * D ■ V/L ■ V /V*. This says the mental 
effort required to Implement any algorithm with a given 
potential volume should vary with the square of its 
/olume in any language. E has often been used to measure 
che effort required to comprehend an implementation 
rather than produce it, i.e., E may be a measure of pro- 
gram clarity. 

To calculate the time of development, software 
science uses the concept of a moment, defined by the 
psychologist Stroud as the time required by the human 
oraln to perform the most elementary discrimination. 

These moments have been shown to occur at a rate of 5 
to 20 per second. Denoting moments (or Stroud's number) 
by S, we have 5 i S 20 per second. Assuming a pro- 
grammer does not "time share" while solving a problem, 
ind converting the effort equation (which has dimensions 


Other resources . In what has been stated so far, 
resource expenditure and estimation have been pre- 
dominantly computed in terras of effort. The formula for 
cost may be a simple multiplication of the staff months 
times the average cost of a staff member or it may be 
more complicated. It may include some difference for 
the cost of managers versus the cost of programmers 
versus the cost of support personnel whose role varies 
across the life cycle (13). 

The schedule may be derived based upon historical 
data, with effort allocated to different activities 
based upon the known percentages or it may be dictated 
by the model itself, as with the Rayleigh curve. How- 
ever, the dynamic models generate what they consider *the 
ideal staffing conditions which may not be the actual 
ones available. Thus, in fitting actual effort to the 
estimated or proposed effort, some decisions and trade- 
offs must be made. 

Computer time is yet another resource. Unfor- 
tunately, none . of the above models treats this within 
the same formula. In general, they have a separate 
formula for computer time again based upon computer use 
in similar projects. These models vary from a simple 
table type model (6) to some very sophisticated proba- 
bility distribution based on reliability modeling for 
phases of the development, such as testing (14). 

Changes and Errors 

There are process aspects other than resource ex- 
penditures that provide information about managing and 
engineering the process and the product. One such 
aspect is the changes and errors generated during de- 
velopment or maintenance. Monitoring the changes in the 
software provides a measure of level of effort to get 
the product in order. If we can classify the types of 
changes that occur or their source of origin, we can 
categorize the environment and gain Insight into how 
to manage or minimize the effect of particular types of 
changes. For example, suppose the user is generating a 
series of major changes at a continual rate. This may 
provide management with the information it needs to 
reclassify the environment from its original one to a 
more complex one, permitting modification of the cost 
parameters in the resource estimation model and a re- 
estimation of cost part way through the project. It 
could also provide management with the necessary insight 
to change the development approach or methodology to one 
that is more insensitive to externally generated change, 
such as some incremental development approach. 

Monitoring errors provides information with regard 
to the quality of the product. A product developed 
with only a few errors or with errors found early and 
an error rate decreasing during development and testing 
will warrant more confidence in its quality. Keeping 
track of the time to find and fix errors gives insights 
into cost. Knowing the types of errors being made helps 
in focusing attention to particular problems during the ■ 
code-reading and design-review sessions. 
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Program evolution measures . Belady and Lehman (15) 
have examined the changes occurring in software during 
maintenance and derived a set of laws for program 
evolution. Based on such parameters as size. of the 
system, number of modules added, deleted or changed, 
the release data, manpower, machine time and cost, they 
derived the following laws: 

1. Law of continuing change . A system that is 
used undergoes continuing change until it is judged 
more cost effective to freeze and recreate it. 

2. Law of increasing entropy . The entropy of a 
system (its unstructuredness) increases with time, 
unless specific work is executed to maintain or reduce 
it. 

3. Law of statistically smooth growth . Growth 
trend measures of global system attributes may appear 
to be stochastic locally in time and space, but, sta- 
tistically, they are cyclically self-regulating with 
well-defined long-range trends. 

These laws can be demonstrated by using the following 
metrics : 

RSN, the release number 

D^, the age of system at release R 

I , the time between releases R-l and R 
r 

M , the number of modules in the system 

MH , the number of modules handled during release 
interval I (estimator of activity under- 
taken in each release) 

HR = MH /I , the handle rate 
r r r 

C = MH /M , the complexity which is the frac- 

r r tion of released system modules that 
were handled during the course of 
the release R. 



C has been observed to be monotonically increasing 
and approaching unity over time (for OS 360, approxi- 
mately 20 releases over 10 years) . 

Using these metrics, management can predict when 
it is too costly to modify a system, i.e., when it is 
cheaper to redesign than make the next change. It 
can also determine whether enough effort is being de- 
voted to keep future changes at a reasonable cost. 

Program- changes . Dunsmore and Gannon have proposed 
a measure called program-changes which correlates very 
highly with errors (16). A program-change is a textual 
revision in the source code of a module during the de- 
velopment period. One program-change should represent 
one conceptual change to the program. Thus, a program- 
change is defined as one or more changes to a single 
statement, one or more statements inserted between ex- 
isting statements, or a change to a single statement 


followed by the insertion of new statements. On the 
other hand, the following are not counted as program- 
changes: the deletion of one or more existing state- 

ments, insertion of standard output statements or 
special compiler-provided debugging directives, and 
insertion of blank lines or comments. Basili and 
Reiter showed that program- changes were minimal when a 
good software development method was used (17). 

Error-day . An error-based measure of product 
quality was proposed by Mills (18) which he called the 
error-day. The motivation is that the longer an error 
remains in the system the more expensive and less re- 
liable it is. The error-day measure is simply the sum 
over each error of the number of days it has existed 
within a system. It weights errors by their duration 
in the system. Clearly, a low error-day count is an 
indicator of a well-engineered program. This measure 
could be automated by using the concept of program- 
changes and plotting them against time. 

Job-steps . An indication of the amount of effort 
expended in development can be the number of computer 
accesses or job-steps. A computer job-step is a 
single programmer-oriented activity performed on a com- 
puter at the operating system command level, which is 
basic to the development effort and involves nontrivial 
expenditures of computer or human resources. Typical 
job-steps might be text editing, module compilation, 
link editing, and program execution. Basili and 
Reiter (17) found job-steps to be a serious differ- 
entiator of development environments, and that good 
methodology leads to a smaller number of job-steps. 

There exist many other measures of the software 
development process. The interested reader is re- 
ferred to some general references in the literature, 
e.g., Curtis (19), Mohanty (20), Belady (21). 

PRODUCT MEASURES 

Actually, all the previous measures could have 
been considered measures of the product. If a product 
takes a long time or a large effort to develop, we may 
consider it a complex product. If there were lots of 
errors found at the tail end of product development 
or if the rate of finding errors was increasing every 
day, we would say the quality of the product was very 
low. However, each of those indicated as much, if not 
more, about the process than the product. 

The measures discussed in this section are probes 
into the product. They are taken at a discrete point 
in time, usually on the final deliverable product. 

Even though examining the changes in value of the 
metrics on the product over time could be very informa- 
tive with regard to the process, we will classify them 
as product measures. We categorize these measures with 
respect to size, structure and reliability. 

Size 

The size of a product is a simplistic measure and 
easy to calculate. It is a reasonable indicator of the 
amount of work expended and correlates well with effort. 
Size metrics are used for cost estimation, comparison 
of products, and for measures of productivity. Although 
it may be a basic ingredient in effort and productivity 
measures, it must be modified by many other factors, 
such as reliability and complexity. These measures 
will be treated in subsequent sections. 

The most common measure of size is lines of code. 
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. However, what gets measured depends to a great extent 
_on our interests. For example, if we are interested 
in measuring effort, then source lines including com- 
ments and data are a reasonable measure and have been 
used in several studies (1, 2). If we are interested 
in function size, a better approximation may be exe- 
curable statements. If our interest is in comparing 
the size of resulting products for operational use, a 
common denominator is number of machine language in- 
structions. Clearly, there is little agreement on the 
— appropriate measure of lines of code and the choice 
should depend upon che issue under consideration. It 
is important in reading che literature chat we clearly 
understand which measure of size is being used, since 
the authors do not always make it clear. 

Another measure of size is to treat units larger 
than lines of code. One common unit is che module. 
Modules are used in the measures of Belady and Lehman 
(21) and were shown to be reasonable measures for cost 
estimation by Freburger and Basili (2). Smaller units, 
such as procedures or functions, were used by Basili 
and Reiter (17). Again, the choice is dependent upon 
— the purpose of the measure. For estimation, it is 
sometimes easier to predict the number of modules 
rather than the number of lines. However, comparison 
■nay be difficult since there is no standard definition 
of module. 

On the ocher end of the size spectrum is the num- 
ber of operators and operands as defined in software 
science by Halstead (12). More specifically, the 
length and volume measures are potential measures for 
size of an implementation and size of the function, 
respectively. There have been several studies that 
support these metrics as reasonable approximations to 
-what they purport to measure. They make good metrics 
for comparison and possible evaluation, but there is 
potential for using them for estimation also. 

.. Structure 

The structure of a program is often a good indi- 
cator of whether that product is well designed, under- 
standable, and easy to modify. Structure measures 
are often proposed as measures of the complexity of 
the product. In examining structure, we may be con- 
cerned with the control structure, the data structure, 
or a mixture of the two. 

Control structure measures . The simplest control 
structure metric is the number of decisions (17) as 
measured by the number of constructs that represent 
branches in the flow of control, such as i£ then else 
or while do statements. There is a basic belief that 
the more control flow branching there is in a system 
the more complex it is. A variation of this measure 
is the relative percentage of control flow branching, 
i.e., the number of decisions divided by the number of 
executable statements. Early studies by Aron (7) 
showed that varying levels of this type of complexity 
could account for a nine to one difference in 
productivity. 

A more refined measure of control complexity is 
cyclomatic complexity as proposed by McCabe (22). The 
cyclomatic complexity of a graph is defined as the 
number of edges minus the number of nodes plus the 
number of connected components, and is equal to the 
minimum number of basic paths from which all other paths 
may be constructed. Given a program in which all state- 
ments are on a path from the entry node to an exit node, 
the cyclomatic complexity can be defined as the number 
of predicates plus the number of segments. A predicate 


is defined as a simple Boolean expression governing 
che flow of control and a segment is defined as an in- 
dividual routine (procedure or function). 

The measure originated as a count of the minimum 
number of program' paths to be tested. This is one 
quantitative measure of a program's complexity. The 
measure is usually applied at the module level and 
McCabe proposed a cyclomatic complexity of ten as an 
upper bound for the safe range with regard to the com- 
plexity of a module. Several variations of the basic 
cyclomatic complexity measure have been studied by 
Basili and Reiter (23). They evaluated their sensi- 
tivity to different software development environments 
with reasonable success. They have also defined some 
approaches co using the measure at the product level 
rather than the module level in a way that is reason- 
ably insensitive to system modularization. 

Other measures of control complexity involve che 
weighting of various types of control structures as 
co whether they are simple or complex, where simple 
means easy to read and prove correct based upon the 
graph structure. For example, single-entry single-exit 
program graphs that contain a single predicate node 
are easier to understand and abstract from chan more 
complicated graph structures. Thus, one approach 
would be to weight various graph structures based upon 
this complexity. This type of measure requires a more 
detailed analysis of the program structure chan does 
the cyclomatic complexity measure, but tends co be a 
deeper measure of control flow and can include other 
complexity factors, such as nesting level. One such 
measure is essential complexity (22), which assigns 
every program using only structured programming con- 
trol structures a complexity of one. 

Data structure measures . Data structure metrics 
try to measure the complexity of the program structure 
by the way the data is used, organized, and allocated. 
Clearly, the simpler the reader's ability to abstract 
the use of data the easier the program will be to 
understand and modify. Several measures have been 
used for evaluating the structuring of the data in a 
program and a few will be discussed here. 

The segment-global usage pair metric (24) attempts 
to measure the goodness of the use of globals in the 
program. A segment-global usage pair (p, r) is an 
instance of a global variable r being used by a seg- 
ment p (i.e. , r is either modified or accessed by p. 

Each usage pair represents a unique "use connection” 
between a global and a segment. Let actual usage pair 
(AUP) represent the count of realized usage pairs, i.e., 
r is actually used by p. Let possible usage pair (PUP) 
represent the count of potential usage pairs, i.e., 
given the program's globals and their scopes, the scope 
of r contains p so that p could potentially modify or 
access r. This represents a worst case. Then the 
relative percentage usage pairs (RUP) is RUP ” AUP/PUP 
and is a way of normalizing the number of usage pairs 
relative to the problem structure. The RUP metric is 
an empirical estimate of the likelihood that an 
arbitrary segment uses an arbitrary global. 

The data binding metric (24, 25) is an attempt at 
measuring the inter-relationship of modules or segments 
within a program. A segment-global-segment data bind- 
ing (p, r, q) is an occurrence of the following: 

(1) segment p modifies global variable r, (2) variable 
r is accessed by segment q, and (3) p i q. The exist- 
ence of a data binding (p, r, q) implies that q is 
dependent on the performance of p because of r. Binding' 
(p, r, q) does not equal binding (q, r, p). (p, r, q) 
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represents a unique communication path between p and q 
and the total number of data bindings represents the 
degree of a certain kind of "connectivity," i.e., be- 
tween segment pairs via globals, within a complete 
program. Let actual data bindings (ADB) represent the 
absolute number of realized data bindings in the pro- 
gram, i.e., the realized connectivity, and possible 
data bindings (PDB) represent the absolute number of 
potential data bindings given the program's global 
variables and their declared scope (i.e., same worst 
case) . Then we can normalize the number of data bind- 
ings by calculating the relative percentage RDB = 
ADB/PDB. This gives some relative measure of the amount 
of information exchanged in the program. 

A measure of the amount of data required to be 
understood by the programmer while reading a program 
is span (26). A span is the number of statements be- 
tween two consecutive textual references to the same 
identifier. Thus, for n appearances of an identifier 
in the source text, n-1 spans are measured. All appear- 
ances are counted except those in declare statements. 

If the span of a variable is greater than one hundred 
statements, then one new item of information must be 
remembered for a hundred statements until it is read 
again. The complexity of the program would be the num- 
ber of spans at any point, i.e;., the amount of data 
the reader must be aware of when reading any particular 
statement. 

Control and data structure measures . There are 
models of structure that address the integration of 
control and data flow. One such model is slicing (27). 
Informally, slicing reduces a program to a minimal 
form which still produces a given behavior for a sub- 
set of the data. The desired behavior is specified 
as a projection from the program's original behavior. 

For instance, if a program computes values for vari- 
ables X, Y, and Z, then one projection might be the 
value of X at program termination. The minimal pro- 
gram is obtained by eliminating program statements 
which do not affect the projected behavior. The re- 
sult is a smaller program which contains only those 
statements from the original program which affect the 
selected behavior. 

There are several possible metrics based on pro- 
gram slicing. These include (1) coverage, the ratio 
of slice length to program length; (2) overlap, a 
measure of the sharing of statements among different 
slices; (3) clustering, the percentage of statements 
in the slice which were adjacent in the original pro- 
gram; (4) parallelism, the number of almost disjoint 
slices; and (5) tightness, the ratio of statements 
found in every slice to total statements in the original 
program. Each of these metrics gives some view of the 
complexity of the program with respect to the control 
and data flow. 

Reliability 


error classification schemes to analyze the reliability 
of a release of an operating system. 

With regard to the operation of the program, 
several reliability models have been proposed in the 
literature (14, 30, 31, 32). Software reliability 
here is defined as the probability that a given soft- 
ware program operates for some time period without 
software error which is detectable by executing the 
code on the machine for which it was designed, given 
that it is used within design limits. Reliability 
measurement can be done for evaluation purposes as well 
estimation purposes. The models measure reliability as 
a function of calendar time, computer usage or accumu- 
lated man hours and require parameters, such as the 
error detection rate and the total number of errors in 
the system, before testing. These estimates can be 
based on theoretical assumptions or historical data. 


A particular reliability model due to Shooman (30) 
is based upon a set of assumptions, such as (1) the 
operational software errors occur due to occasional 
traversing of a portion of the program in which a 
hidden software bug is lurking; (2) the probability 
that a bug is encountered in the time interval At, 
after t successful hours of operation is proportional 
to the probability that any randomly chosen instruction 
contains a bug, i.e., the fractional number of remain- 
ing bugs f . Then the probability of a failure during 
time interval (t, t + At), given no failures have oc- 
curred up until t is proportional to the failure rate 
z(t) (hazard function). Thus, the probability of 
failure in interval At, given no previous failure, is 
P (t < t t + At | t > t) = z(t) * At = K6. (t) At 
where t f is operating time to failure, K is an arbitrary 
constant, t is the debugging time in man months, t is 
the operating time in hours. K can be estimated by ex- 
amining the history of errors detected, e.g., 

K— ^ ca tastr ophic er rors detected. 
total II errors detected 


The probability of no system failure in the 
val (0, t) is given by the reliability function 

- z(x) dx 


R(t) = e ° 


inter- 


assuming reliability is related to the failure rate. 
Assuming K and { (t) are independent of operating time 

t we get 


R(t) = e -«* r (T)} tce -6t 


-K{ E T/I - f (T))t 
= e 1 c 

where ( is the number of corrected errors, E^, is the 
total number of initial bugs in the program and I T is 
the number of instruction in the program. This implies 
the probability of successful operation without soft- 
ware bugs is an exponential function of operating time. 


Measuring the reliability of a product may involve 
an analysis of the (1) distribution or classification 
of errors, or (2) execution of the product in a testing 
or operational environment. Metrics involving the dis- 
tribution of errors can include the program changes 
and error-day metrics discussed earlier. Other metrics 
involve distributions, such as fixes per line of code, 
fixes per phase, errors per person hour, errors per 
type of change causing the error, fixes per detection 
and correction technique, etc. Weiss (28) has studied 
various distributions in evaluating a development 
methodology by showing a profile of the error distribu- 
tions made when using the methodology. Endres (29) used 


A simplier way to summarize the results of the re- 
liability model is to compute the mean time to (software) 
failure, MTTF using the reliability function 


MTTF 
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kinds of errors committed. 



1/4 1/2 3/4 

normalized time 


Note the most improvement in MTTF occurs during the 
last quarter of debugging. 

Other models are based upon different assumptions, 
--out all yield some measure of the reliability of the 
product . 

The reader interested in other product measures 
_s again referred to some general references in the 
literature (19, 20, 21). 

RODUCT MEASURES ACROSS TIME 

As mentioned earlier, measures can be taken once 
on the final product or at discrete Intervals through- 
ut the life cycle. In this latter approach, metrics 
an be used to monitor the stability and quality of 
-the product. By re-evaluating the metrics periodi- 
cally, we can see if the product is changing its charac- 
ter in any way. It can provide feedback during develop- 
ent and maintenance. For example, if we find that 
ver a period of time more and more control decisions 
have entered the system, then something may have to be 
done to counteract this change in character. 

This approach is a way of providing a relativ- 
istic evaluation of the product. As such, it is 
easier to understand chan an absolute measure. That 
s, it may be more informative to know that each 
hange we make in the system Increases the complexity 
of the system, than to know the total complexity of 
the system is some specific number. Here we need 
-nly compare the values of the metrics with values of 
he metrics on earlier versions of the system. The 
rawback to an absolute measure is that we have 
nothing to compare it to. 

VTA COLLECTION 

One major concern with performing measurement is 
the ability to collect reliable data. Before we begin 
Electing data, however, we must first understand 
he various factors that characterize our environment. 

-a must isolace those factors we hope to control, 
measure, and understand so that we may analyze their 
“f f ect. 

With regard to the actual data collection process, 
there are various approaches. Data collection can be 
automated, meaning there is no interference to the 
■velopers, or non-automated , meaning the data is 
illected from the developers using forms or interviews. 
Automated data collection tends to be more reliable 
and can be done without the participants being aware 
f what specific activities and factors are being 
:udied. Reporting forms and interviews can provide 
.-ore detailed insights into the process and give a 
level of information that is not available in an auto- 
™aced collection process, e.g., insights into the 


Clearly, the data collected should be driven by 
the models and metrics we are interested in using; 
however, it doesn't hurt to add other data which may 
give us ^information about refining and modifying those 
models and metrics. All the data collected should be 
entered into a data base and validated, as much as 
possible, for easy reference and access. 

A first step in the validation of forms is a re- 
view of the forms as they are handed in; someone con- 
nected with the data collection process should ensure 
that the appropriate forms have been handed in and 
that the appropriate fields have been filled out. The 
data should be entered into the data base through a 
program that checks the validity of the data format 
and rejects data out of the appropriate ranges. For 
example, this program can assure that all dates are 
legal dates and that system component names and pro- 
grammer names are valid for the project by using a 
prestored list of component and programmer names. 

Ideally, all data in the data base should be re- 
viewed by individuals who know what the data should 
look like. Clearly, this is expensive and not always, 
possible. However, several projects should be re- 
viewed in detail and the number and types of discrep- 
ancies kept so that bounds can be calculated for the 
unchecked data. This allows data to be Interpreted 
with the appropriate care. 

Another type of validity check is to examine the 
consistency of the data base by comparing redundant 
data. For example, if effort data is collected both 
at the budget level and at the individual programmer 
level, there should be a reasonable correlation between 
the two total efforts. Another approach is to use 
cluster analysis to look for patterns of behavior that 
are indicative of errors in filling out the forms. 

For example, if all the change report forms filled out 
by a particular programmer fall into one cluster, it 
may imply that there is a bias in the data based upon 
the particular programmer. 

Data collection is a serious problem, especially 
on large programming projects involving character- 
istically different environments. One set of forms 
may not be enough to capture what is happening across 
all environments. However, if we are to use this data 
in models and metrics, we need to know how valid that 
data is in each case so as to avoid Improper conclusions. 

CONCLUSION 

Having fit the models to the data, we must analyze 
and interpret their results carefully. As stated 
earlier, we oust understand the environmental parameters 
under which the project was developed. We must know 
the assumptions, strengths and weaknesses of the models 
in order to interpret the results for the particular 
project. Our level of confidence in the particular 
model or metric should be based upon the level to which 
the model or metric has been tested. If the results 
support our intuition, we understand what the model 
means in our environment; if not, understanding the 
model's shortcomings can yield insights into the model 
and our environment. 

Quantitative support can be an excellent aid and 
risk reducer in making a difficult management or 
engineering decision. An organization should build 
up its knowledge and expertise in quantitative analysis 
of software development. In this way, confidence in 
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the various models and metrics can be acquired through 
direct experience. 
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